AITopics | Abington

Collaborating Authors

Abington

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

Modarressi, Ali, Köksal, Abdullatif, Imani, Ayyoob, Fayyaz, Mohsen, Schütze, Hinrich

arXiv.org Artificial IntelligenceApr-17-2024

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) $\unicode{x2013}$ though non-parametric $\unicode{x2013}$ has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.

language model, llm, relation, (16 more...)

arXiv.org Artificial Intelligence

2404.11672

Country:

Asia > China > Hong Kong (0.05)
North America > United States > New York > Chemung County (0.04)
Asia > Singapore (0.04)
(16 more...)

Genre: Research Report (0.84)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Plan for Language Modeling from Unlabeled Data

Cornille, Nathan, Moens, Marie-Francine, Mai, Florian

arXiv.org Artificial IntelligenceMar-31-2024

By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. By conditioning on generated latent plans, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.00614

Country:

Africa > Ethiopia (0.14)
North America > United States > Pennsylvania > Berks County > Reading (0.04)
Europe > Italy (0.04)
(19 more...)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback